Automatic Clause Boundary Annotation in the Hindi Treebank
نویسندگان
چکیده
In this paper, we propose a method for automatic clause boundary annotation in the Hindi Dependency Treebank. We show that the clausal information implicitly encoded in a dependency structure can be made explicit with no or less human intervention. We exercised the proposed approach on 16,000 sentences of Hindi Dependency Treebank. Our approach gives an accuracy of 94.44% for clause boundary identification evaluated over 238 clauses. The resultant corpus has varied usages and can be utilized for developing a statistical clause boundary identifier.
منابع مشابه
Coreference Annotation Scheme and Relation Types for Hindi
This paper describes a coreference annotation scheme, coreference annotation specific issues and their solutions through our proposed annotation scheme for Hindi. We introduce different co-reference relation types between continuous mentions of the same coreference chain such as ‘Part-of’, ‘Function-value pair’ etc. We used Jaccard similarity based Krippendorff‘s’ alpha to demonstrate consisten...
متن کاملAutomatic Extraction of Clause Relationships from a Treebank
The paper concentrates on deriving non-obvious information about clause structure of complex sentences from the Prague Dependency Treebank. Individual clauses and their mutual relationship are not explicitly annotated in the treebank, therefore it was necessary to develop an automatic method transforming the original annotation concentrating on the syntactic role of individual word forms into a...
متن کاملSemantic Roles for Nominal Predicates: Building a Lexical Resource
The linguistic annotation of noun-verb complex predicates (also termed as light verb constructions) is challenging as these predicates are highly productive in Hindi. For semantic role labelling, each argument of the noun-verb complex predicate must be given a role label. For complex predicates, frame files need to be created specifying the role labels for each noun-verb complex predicate. The ...
متن کاملA hybrid approach for automatic clause boundary identification in Hindi
A complex sentence, divided into clauses, can be analyzed more easily than the complex sentence itself. We present here, the task of clauses identification in Hindi text. To the best of our knowledge, not much work has been done on clause boundary identification for Hindi, which makes this task more important. We have built a Hybrid system which gives 90.804% F1-scores and 94.697% F1-scores for...
متن کاملEmpty Categories in Hindi Dependency Treebank: Analysis and Recovery
In this paper, we first analyze and classify the empty categories in a Hindi dependency treebank and then identify various discovery procedures to automatically detect the existence of these categories in a sentence. For this we make use of lexical knowledge along with the parsed output from a constraint based parser. Through this work we show that it is possible to successfully discover certai...
متن کامل